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A CRITICISM OF RECENT ATTEMPTS TO MEASURE 
LANGUAGE ABILITY 



BAKER BROWNELL 
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Further progress in language scales depends to a great extent 
on an analysis of the term language ability. The term is used to 
cover generally that which is measured by scales, such as Trabue's 1 
completion test language scales, Starch's 3 grammar and punctua- 
tion scales, and on the other hand the Ballou 3 and Hillegas 4 com- 
position scales, without careful distinction or recognition of the 
different kinds of material included. Working on the assumption 
that a child's language ability corresponds to the concrete result 
of his activity in composition, namely, his theme, investigators 
have proceeded to measure language ability by attempting to 
evaluate these concrete results and to derive scales by arranging 
in ascending or descending order type themes of definitely assigned 
values. There are obviously great difficulties accompanying such 
an attempt — first, in determining what a type theme is; secondly, 
in the operation of scientific comparison between the type theme 
and the theme to be graded. Both have been dealt with for the 
most part by a statistical summarizing of teachers' opinions, which 
in this case is eventually a doubtfully scientific method. Where 
analysis has been used to any extent, it has been in determining 
what are or should be type themes, a question which is psychological 
and pedagogical in so far as it relates to the theme-writer's capaci- 
ties, and statistical and objectively experimental so far as it refers 

1 M. R. Trabue, Completion Test Language Scales (Teachers College, Columbia 
University). 

•Daniel Starch, "The Measurement of Achievement in English Grammar," 
Journal of Educational Psychology, VI, 615-26. 

* F. W. Ballou, "Scales for the Measurement of English Compositions," Harvard- 
Newton Bulletin, No. 2 (Harvard University). 

< M. B. Hillegas, "A Scale for the Measurement of Quality of English Composi- 
tions," Teachers College Record (September, 1912). 
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to his themes. The operation of comparison of a theme with the 
scale has received little attention, an analysis and definition of 
which would undoubtedly react toward changes in the scale itself. 
The defining of this operation will also necessitate psychological 
categories, such, for example, as Kayfetz 1 suggests with reference 
to the derivation of scales. Secondly, the operation of comparison 
in any scientific degree depends on an analysis of the materials com- 
pared and a co-ordination of these analytical elements. The two 
materials to be compared in this case are the written language or 
composition of the child's theme and the language of the scale 
theme. Though there are also other comparable elements, no 
doubt, the actual language elements are the chief. Such an analy- 
sis, could it be carried out, must react with great effect on the nature 
of language scales. 

There are, to repeat, these two operations necessary with respect 
to every language scale: the derivation of the scale and the using 
of the scale. While the derivation of scales has received a great 
deal of attention, the study of the using of scales has received little. 
Using the scale consists mainly in the comparison of a scale theme 
with a student's theme. This operation of comparison has been 
left relatively uninvestigated and unanalyzed. At least one 
necessity of scientific comparison is, as before said, an exact analysis 
of the materials compared. This means that language, with respect 
to language scales, cannot be taken uncritically, or generally, or 
only vaguely evaluated and analyzed, as in the past, and that 
language cannot be treated as a single and undifferentiated whole. 
Development of language scales and more particularly the use of 
language scales depend on a uniform and systematic analysis and 
evaluation of language elements. 

Language — and, much more, language ability — is from an 
important aspect a collective term. The multiplicity of the con- 
tent of the term language has been and will continue to be, until 
appreciated and definitely analyzed, a source of intellectual chaos 
in any set of scales that can be devised. Though the term language 
is unified by the fact that all language has one general function, it 

1 Isador Kayfetz, "A Critical Study of the Harvard-Newton Composition Scales,' ' 
Pedagogical Seminary, XXIII, No. 3, p. 339. 
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is not less important to note that language is composed of many 
and very different groups of elements foreign to each other except 
for their participation in the common function of human communi- 
cation. Leaving the further complication introduced by the word 
"ability," language alone from the point of view of the maker and 
especially the user of language scales has difficulties of analysis 
hitherto untouched. The meagerness and generality of the quite 
unsystematic notes, which are all that the Ballou scale gives for 
comparing the pupil's theme with the scale, illustrate this lack of 
analysis. The language has been left unanalyzed, merely the sub- 
ject of general opinion. With few exceptions language scales have 
used language as a lump term including various and vague com- 
ponent materials. The term language has been used from the 
standpoint of its definition as to function, not from its definition as 
to content and language structure. 

Trabue 1 is willing to admit frankly that he does not know what 
language ability is, but believes that his scale measures this lan- 
guage ability, whatever it is, fairly well. The logic of his method 
as of that of Ballou and Hillegas is clear from such a statement. 
These scales are attempting to measure the purely functional aspect 
of language. They are attempting to measure a child's ability to 
use language, his ability to make language perform its function, 
regardless, not only of what ability is, but of what definitely lan- 
guage is. It is an empirical method. It can merely correlate a 
child's performance with some type performance. What it is that 
he performs, what the materials of his performance are, whether 
two children are performing the same thing, is unanswered. 
Though specious results are obtained, they can by this method be 
founded only on intellectual confusion. It is educational herb 
doctoring, a method which cannot go far in science. Some analysis 
of the content of the term language from the point of view of the 
language scale is necessary. 

Special experimental study of one of the composition scales 
with reference to its adequacy as a means of measuring a child's 
language in written composition was carried out last year in 

1 M. R. Trabue, Completion Test Language Scales (Teachers College, Columbia 
University), Introduction. 
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connection with the bureau of measurements at the Kansas Normal 
School. In the study 1 it was found that language measurement by 
the empirical method used in composition scales was not only inade- 
quate and inapplicable as an analytical designation of measurable 
errors and defects in composition, but that the scale did not survive 
an empirical test similar to that used in its production aiming to 
test the scale in its ability to measure the functional success of 
language. The scale failed, in other words, not only to measure 
language ability with reference to the content of the term; it 
failed equally in measuring it with reference to function. The 
twenty-five opinions from Kansas did not agree with the twenty- 
five opinions from Massachusetts as to the relative successes of the 
compositions in fulfilling their function. This failure was found 
to be due, first, to a lack of analysis of those elements in composi- 
tion which were to be measured, an analysis necessary in order to 
give a common ground to teachers in the operation of comparing 
the child's theme with the scale; and, secondly, to a lack of co- 
ordination of these elements. 

Such scales evidently do not express the quality of themes, as 
Hillegas says, in the same sense, though not so accurately, as milli- 
meters express the length of lines. While a line is an absolutely 
simple analytical unit, having but one characteristic, length, with 
which a millimeter, a perfectly simple statement of abstract value 
in terms of length, can be repeatedly placed in one-to-one corre- 
spondence, thus measuring the line, these scales are composed of 
themes, quite unanalyzed as to language and necessarily complex — 
themes which have a multiplicity of variables within them, and 
which as wholes cannot be placed in one-to-one correspondence with 
other themes. When one teacher, for example, gives more weight 
to an error in grammar than to an error in spelling, when another 
believes that good thought will carry over many an error in spelling 
or punctuation, when another is impressionable to elegant sentence 
structure and another to choice of words, how, when all these 
elements and others are present unanalyzed in the scale theme, can 
this theme be set in one-to-one correspondence with another as a 

1 Baker Brownell, "A Test of the Ballou Scale of English Composition," School 
and Society, IV, 938-42. 
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means of quantitative measurement? The fact that all these 
elements and many more must be organized together to make 
language perform its function gives no basis to the assumption that 
a scale by lumping them all and proceeding to judge by general 
effect can successfully measure language ability. Scales cannot 
proceed by throwing all variables into one dish and measuring 
them in the mass. They cannot proceed in this way because 
variations in general effect may depend successively on quite differ- 
ent variables left undistinguished within the mass, thus making 
quantitative comparison, a necessity of measurement, impossible. 
A scale can successfully measure but one variable at a time; if 
more, the interrelations of the several variable elements must be 
known. A measurement of general effect, however, measures 
neither the elements nor their organization. Until it is known 
precisely what composes general effect there can be no success in 
standardizing it or in using it as a basis for comparison with chil- 
dren's themes. 

Thorndike 1 notes a similar analogy in language measurement. 
One inch, he says in effect, is said to be equal to another inch ulti- 
mately because of statistical evidence empirically derived by 
expert opinions. With respect to English writing, he says: "The 
only logical difference between equating the two lengths and equat- 
ing the two specimens of writing [is] that the variability of expert 
judges in the latter case is so great." This quite ignores, however, 
the many types of variability, not only in judges, but in English 
writing — itself a fact which makes the analogy exceedingly ques- 
tionable. An inch is an analytical unit. A comparison of inches 
can be made only on the basis of exact knowledge of the analytical 
content of inches. This content is known by the judge to vary in 
only one possible way — namely, in length — while in true inches he 
knows that it varies not at all. The inch is an abstract and arti- 
ficially simple magnitude with which other inches can be put into 
simple correspondence with a possible variation of only one type, 
namely, length. To this an English theme is far from analogous. 
An English theme is concrete and, so far as can be determined, has 

1 E. L. Thorndike, "A Scale for Measuring the Merit of English Writing," Science, 
XXXIII, 935. 
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not, like the inch, an absolute, analytically simple, objective 
standard of merit to which an expert judge may refer for compari- 
son. The theme has a manifold of variables and of types of 
variability within it, left by Thorndike, as by Hillegas, unanalyzed. 
These as aggregates cannot be compared quantitatively with each 
other because they cannot be put into simple correspondence. 
While the judges of inches necessarily have a uniform scheme of 
valuations, the judges of themes, so long as the theme material 
remains generally complex, unanalyzed, unco-ordinated, and with- 
out exact definition, can have no uniform system of valua- 
tions. The difference is between specific analysis and general 
effect. 

General effect is the loosest and most nebulous of terms. Not 
only does it depend on various rhetorical parts which the science 
of rhetoric, if it be science, has long tried to analyze; it depends 
also on variable human receptors. For the purposes under con- 
sideration — namely, for the purposes of scale using as well as scale 
derivation — the rhetorical elements and the human receptor both 
remain unanalyzed and unevaluated. W. S. Monroe, in summar- 
izing recent work in composition in the Bureau of Educational 
Measurements and Standards at the Kansas Normal School, states 
that investigations show that chaos prevails in composition scale 
work owing to lack of adequate analysis. It may be safely said 
that the next future of the language scale will be an analysis of the 
elements necessary to the comparison of themes and to scale 
derivation ; it will be the analytical phase. 

Analyses of language with respect to its amenability to measure- 
ment are at present valuable, not so much in defining for scale use 
the final elements of language as in denoting some of the com- 
plexities and problems in this field that must be met by those who 
will progress to a scientific basis for composition scales. 

As a tentative definition for experimental purposes the following 
may be used: Success in the use of written language depends on 
the ability of a writer to relate his words to a reader in a way that 
affects the reader according to the writer's intention. This possibly 
may be termed a behaviorist or, better, a Freudian definition of 
language in so far as it attempts to eliminate the " transmission of 
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ideas" notion of language. The hypothesis of language as a sym- 
bolic vehicle somehow capable of transporting a fixed content of 
consciousness from one mind to another is here given up. This 
greatly disarranges the orthodox "form-content" analysis of 
language, and perhaps destroys it. It necessitates at least a 
treatment of the "form-content" relation from a different 
basis. 

The relations of language to the reader may be divided in 
general into two classes. The first class includes those formal 
elements of language, such as grammar, spelling, and punctuation, 
the relation of which to the reader is definite and precise and 
requires a response or judgment of merit, which, so far as it relates 
merely to these forms, is fixed and predictable of all literate persons. 
The second class includes, not only the elements of rhetorical form 
proper, but language-content, called the thought or the material 
which people say language transmits. The relation of the second 
class of elements to the reader is variable, resulting in an effect that 
is uncertain and relatively unpredictable owing either to inherent 
lingual plasticity or to ignorance of the complexity of the laws 
governing the case. Toward elements in this second class the 
reader's response or opinion of merit is only indirectly or tentatively 
predictable. 1 

These variable factors of language are the more important and 
obviously the most difficult to identify and measure. The fixed 
elements of language are at once easier to measure and have impor- 
tance only as instrumentalities. 

Although language analysis as a basis for scales has advanced 
little beyond this primary division into the fixed and variable ele- 
ments of language effect, and that unconsciously, indications are 
that this is merely the beginning of the necessary process. Isola- 
tion of the fixed elements of language has proceeded to the extent 
that there are now punctuation scales, grammar scales, and spelling 

1 Future analysis of these variables may determine whether or not Thorndike 
("Notes on the Significance and Use of the Hillegas Scale for Measuring the Quality 
of English Composition," English Journal, II, 554.) is right when he suggests that there 
is ideally an objective absolute of merit for an English theme which standardizes all 
others; and only analysis of these variables, not opinions of their aggregate, can deter- 
mine it. 
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scales. 1 Though it is somewhat doubtful whether spelling can be 
included as an element of composition, since composition takes 
cognizance of no unit less than the word, its inclusion among the 
fixtures of language can cause no trouble, since it is always measur- 
able without reference to other elements of formal composition. 
It can be graded as to difficulty in an independent scale. The 
question concerning it is to what degree spelling ability should be 
included in an estimate of general language ability. Grammar and 
punctuation, unlike spelling, are closely correlated — so closely, 
indeed, that there is doubt whether they are essentially distinct. 
The measure of ability in punctuation is directly related to the 
complexity of grammar and grammatical sentence structure in any 
one instance. A child who uses complex but good grammar and 
fails in punctuating probably cannot be graded with a child whose 
elementary grammar makes punctuation easy. Spelling, punctua- 
tion, and grammar are all three, for that matter, related so inti- 
mately to the "thought-content" that theme measurement by 
scale becomes somewhat complicated. Undoubtedly some types 
of thought put more strain on these three elements of language than 
other types. 

The following remains true of spelling, punctuation, and gram- 
mar, however: it is known definitely when they are wrong and when 
they are right. Any one of these must be judged always by the 
same criteria — its conformity to rule. Judging in this case, some- 
what similarly to judging inches, becomes, not a matter of choice 
or personal selection, but a simple operation of noting deviations 
from a prescribed standard. The error, the nature of the error, 
and its correction are exact certainties. Literate persons know 
whether or not language is grammatical, properly spelled, and, 
usually, properly punctuated, and this response of theirs is fixed 
and in any special case definitely predictable. The only varia- 
bilities which these elements of language can show are those which 
break the rule. Based, as they are, on clear and arbitrary conven- 

1 For example, L. P. Ayres, The Measurement of Spelling Ability (New York: 
Russell Sage Foundation); Daniel Starch, "The Measurement of Efficiency in Spell- 
ing," etc., Journal of Educational Psychology, VI, 3; "The Measurement of Achieve- 
ment in English Grammar," ibid., VI, 615; Educational Measurements (New York: 
Macmillan Co.). 
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tion without recourse to taste or choice, these three offer analytical 
simplicity to the scale-maker. This obvious fixity of standards is 
an important consideration and is the basis of this one of the two 
classes of language elements. 

It is in the class of variable language elements that the scale- 
maker meets, not only the most important, but by far the most 
difficult, problem. Grammar, punctuation, and spelling are in 
effect merely limitations imposed by general rule on the writer. 
They are taken for granted, and for that reason their correctness, 
which is absolute merit in this case, directly affects the reader but 
little. Success in affecting a reader in a way intended by a writer 
depends far more directly on a writer's ability in those variable 
elements of composition which depend, not on fixed rule, at least 
not clearly so, but on choice, on free judgment and taste. At this 
point language aims not merely to be correct. A transposition is 
made from a criterion of internal correctness to a criterion lying in 
the success in relating language in some intended way to the reader 
— in other words, in getting it across. An analysis of these variable 
effects on a reader must be based, if any scientific scheme of 
measurement is in view, on an analysis, and to some extent an 
isolation, of these variable elements in language. Measurement of 
these variables may be attempted by comparing the exact and 
defined effect on a reader with the exact and defined intended effect 
of a writer, thus obtaining the ratio expressing language efficiency. 
But this is made most difficult and probably impossible by the fact 
that there can be no exact knowledge of the scope of a writer's 
intended effect except by his language, nor can the actual effect on 
the reader be accurately gauged, and certainly not designated 
except by referring definitely to those elements of language through 
which the effect is made. Measurement of these variables, to 
repeat, must rely ultimately on an exact analytical evaluation of 
the elements of language itself. While the whole matter is exceed- 
ingly complex, perhaps hopelessly complex, progress will depend, 
at any rate, on the possibility of language analysis, particularly the 
analysis of the so-called variable elements. 



